On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling

نویسندگان

  • Marcin Budka
  • Bogdan Gabrys
  • Katarzyna Musial
چکیده

Generalisation error estimation is an important issue in machine learning. Cross-validation traditionally used for this purpose requires building multiple models and repeating the whole procedure many times in order to produce reliable error estimates. It is however possible to accurately estimate the error using only a single model, if the training and test data are chosen appropriately. This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on the effect of forest stand distribution pattern on results of different estimators of the nearest individual distance method

The Nearest Individual Sampling Method is one of the distance sampling methods for estimating density, canopy cover and height of forest stands. Some distance sampling methods have more than one density estimator that may be skewed to the spatial pattern. Unless the stands of the trees under study have a random spatial pattern. Therefore, the purpose of this study was evaluating the effect of s...

متن کامل

Some Statistical Inferences on the Parameters of Records Weibull Distribution Using Entropy

 In this paper, we discuss different estimators of the records Weibull distribution parameters and also we apply the Kullback-Leibler divergence of survival function method to estimate record Weibull parameters. Finally, these estimators have been compared using Monte Carlo simulation and suggested good estimators.

متن کامل

Estimation of Variance of Normal Distribution using Ranked Set Sampling

Introduction     In some biological, environmental or ecological studies, there are situations in which obtaining exact measurements of sample units are much harder than ranking them in a set of small size without referring to their precise values. In these situations, ranked set sampling (RSS), proposed by McIntyre (1952), can be regarded as an alternative to the usual simple random sampling ...

متن کامل

A Comparison of Several Robust Estimators for a Finite Population Mean

In survey sampling, ratio and regression estimators are often used to estimate the mean of a finite population. These estimators make use of information on an auxiliary variable that is assumed to be available over the entire population. Generally speaking, the higher the correlation between the response and this auxiliary variable, the more efficient the ratio and regression estimators will be...

متن کامل

Nonparametric Estimation of Conditional Information and Divergences

In this paper we propose new nonparametric estimators for a family of conditional mutual information and divergences. Our estimators are easy to compute; they only use simple k nearest neighbor based statistics. We prove that the proposed conditional information and divergence estimators are consistent under certain conditions, and demonstrate their consistency and applicability by numerical ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Entropy

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2011